1
Use sex and region to create a
count plot to see what combination of sex and region
has the highest number of observations. In other words, where do we
observe the highest number of observations and for which gender?
library(ggplot2)
ggplot(df) + geom_count(aes(x=region , y = sex), color ="blue")

The highest number of observation belongs to southeast and especially
for male in southeast.
2
Does the link between bmi and
charges vary based on smoker?
Create a point plot between the bmi and
charges and color them based on
smoker.
Next add 2 linear trend lines to the plot using geom_smooth and
make sure to color them based on smoker.
Change the axes names to Body Mass Index and
Dollar Charged.
Add the title of the graph to read The link between BMI
and Charges: Smokers and non-smokers.
What do you conclude based on this graph?
ggplot(df, aes(x = bmi , y = charges , color = smoker)) +
geom_point() +
geom_smooth( method =lm) +
labs(title = "The link between BMI and Charges: Smokers and non-smokers")+
xlab("Body Mass Index") +
ylab("Dollar Charged")

This plot shows that BMI plays a bigger role for smokers and charges
increase significantly as the BMI of smokers increases. However, for non
smokers the increase in charges is much less for a higher BMIs.
3
Create a stacked bar chart for each region and fill the bars based on
smoker. Next flip the coordinates (axes) and also change the theme of
the graph to minimal. Add the count of the observations to each bar and
adjust them to see each number in the correct position (note that we
should see 8 numbers in the graph!). Next, instead of letting R choose
the colors for the bars, use the function scale_fill_brewer() and a
palette called “Dark2” to change the color of the bars.
ggplot(df) +
aes(x = region, fill = smoker) +
geom_bar() +
scale_fill_brewer(palette = "Dark2")+
coord_flip() +
theme_minimal() +
geom_text(stat = "count" ,
aes(x = region , label = stat(count)), hjust = 1.5)

4
Using the Plotly package in R, create an interactive chart that shows
the scatter plot between bmi (in the horizontal axis), charges (in the
vertical axis), and color the dots based on regions.
# message=TRUE, warning=TRUE
library(plotly)
plot_ly(
df,
y = ~ charges,
x = ~ bmi,
color = ~ region
)
NA
LS0tDQp0aXRsZTogIkFzc2lnbmVudCAyIg0KYXV0aG9yOiAiU3VnZ2VzdGVkIGFuc3dlcnMiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KIyAxDQoNClVzZSAqKnNleCoqIGFuZCAqKnJlZ2lvbioqIHRvIGNyZWF0ZSBhICoqY291bnQgcGxvdCoqIHRvIHNlZSB3aGF0IGNvbWJpbmF0aW9uIG9mIHNleCBhbmQgcmVnaW9uIGhhcyB0aGUgaGlnaGVzdCBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zLiBJbiBvdGhlciB3b3Jkcywgd2hlcmUgZG8gd2Ugb2JzZXJ2ZSB0aGUgaGlnaGVzdCBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGFuZCBmb3Igd2hpY2ggZ2VuZGVyPw0KDQpgYGB7cn0NCmxpYnJhcnkoZ2dwbG90MikNCg0KZ2dwbG90KGRmKSArIGdlb21fY291bnQoYWVzKHg9cmVnaW9uICwgeSA9IHNleCksIGNvbG9yID0iYmx1ZSIpIA0KYGBgDQoNClRoZSBoaWdoZXN0IG51bWJlciBvZiBvYnNlcnZhdGlvbiBiZWxvbmdzIHRvIHNvdXRoZWFzdCBhbmQgZXNwZWNpYWxseSBmb3IgbWFsZSBpbiBzb3V0aGVhc3QuDQoNCiMgMg0KDQpEb2VzIHRoZSBsaW5rIGJldHdlZW4gKipibWkqKiBhbmQgKipjaGFyZ2VzKiogdmFyeSBiYXNlZCBvbiAqKnNtb2tlcioqPw0KDQoxLiAgQ3JlYXRlIGEgcG9pbnQgcGxvdCBiZXR3ZWVuIHRoZSAqKmJtaSoqIGFuZCAqKmNoYXJnZXMqKiBhbmQgY29sb3IgdGhlbSBiYXNlZCBvbiAqKnNtb2tlcioqLg0KDQoyLiAgTmV4dCBhZGQgMiBsaW5lYXIgdHJlbmQgbGluZXMgdG8gdGhlIHBsb3QgdXNpbmcgZ2VvbV9zbW9vdGggYW5kIG1ha2Ugc3VyZSB0byBjb2xvciB0aGVtIGJhc2VkIG9uICoqc21va2VyKiouDQoNCjMuICBDaGFuZ2UgdGhlIGF4ZXMgbmFtZXMgdG8gKipCb2R5IE1hc3MgSW5kZXgqKiBhbmQgKipEb2xsYXIgQ2hhcmdlZCoqLg0KDQo0LiAgQWRkIHRoZSB0aXRsZSBvZiB0aGUgZ3JhcGggdG8gcmVhZCAqKlRoZSBsaW5rIGJldHdlZW4gQk1JIGFuZCBDaGFyZ2VzOiBTbW9rZXJzIGFuZCBub24tc21va2VycyoqLg0KDQo1LiAgV2hhdCBkbyB5b3UgY29uY2x1ZGUgYmFzZWQgb24gdGhpcyBncmFwaD8NCg0KYGBge3J9DQpnZ3Bsb3QoZGYsIGFlcyh4ID0gYm1pICwgeSA9IGNoYXJnZXMgLCBjb2xvciA9IHNtb2tlcikpICsNCiAgICAgICAgZ2VvbV9wb2ludCgpICsNCiAgICAgICAgZ2VvbV9zbW9vdGgoIG1ldGhvZCA9bG0pICsNCiAgICAgICAgbGFicyh0aXRsZSA9ICJUaGUgbGluayBiZXR3ZWVuIEJNSSBhbmQgQ2hhcmdlczogU21va2VycyBhbmQgbm9uLXNtb2tlcnMiKSsNCiAgICAgICAgeGxhYigiQm9keSBNYXNzIEluZGV4IikgKw0KICAgICAgICB5bGFiKCJEb2xsYXIgQ2hhcmdlZCIpDQpgYGANCg0KVGhpcyBwbG90IHNob3dzIHRoYXQgQk1JIHBsYXlzIGEgYmlnZ2VyIHJvbGUgZm9yIHNtb2tlcnMgYW5kIGNoYXJnZXMgaW5jcmVhc2Ugc2lnbmlmaWNhbnRseSBhcyB0aGUgQk1JIG9mIHNtb2tlcnMgaW5jcmVhc2VzLiBIb3dldmVyLCBmb3Igbm9uIHNtb2tlcnMgdGhlIGluY3JlYXNlIGluIGNoYXJnZXMgaXMgbXVjaCBsZXNzIGZvciBhIGhpZ2hlciBCTUlzLg0KDQojIDMNCg0KQ3JlYXRlIGEgc3RhY2tlZCBiYXIgY2hhcnQgZm9yIGVhY2ggcmVnaW9uIGFuZCBmaWxsIHRoZSBiYXJzIGJhc2VkIG9uIHNtb2tlci4gTmV4dCBmbGlwIHRoZSBjb29yZGluYXRlcyAoYXhlcykgYW5kIGFsc28gY2hhbmdlIHRoZSB0aGVtZSBvZiB0aGUgZ3JhcGggdG8gbWluaW1hbC4gQWRkIHRoZSBjb3VudCBvZiB0aGUgb2JzZXJ2YXRpb25zIHRvIGVhY2ggYmFyIGFuZCBhZGp1c3QgdGhlbSB0byBzZWUgZWFjaCBudW1iZXIgaW4gdGhlIGNvcnJlY3QgcG9zaXRpb24gKG5vdGUgdGhhdCB3ZSBzaG91bGQgc2VlIDggbnVtYmVycyBpbiB0aGUgZ3JhcGghKS4gTmV4dCwgaW5zdGVhZCBvZiBsZXR0aW5nIFIgY2hvb3NlIHRoZSBjb2xvcnMgZm9yIHRoZSBiYXJzLCB1c2UgdGhlIGZ1bmN0aW9uIHNjYWxlX2ZpbGxfYnJld2VyKCkgYW5kIGEgcGFsZXR0ZSBjYWxsZWQgIkRhcmsyIiB0byBjaGFuZ2UgdGhlIGNvbG9yIG9mIHRoZSBiYXJzLg0KDQpgYGB7cn0NCg0KZ2dwbG90KGRmKSArDQogIGFlcyh4ID0gcmVnaW9uLCBmaWxsID0gc21va2VyKSArDQogIGdlb21fYmFyKCkgKw0KICAgc2NhbGVfZmlsbF9icmV3ZXIocGFsZXR0ZSA9ICJEYXJrMiIpKw0KICBjb29yZF9mbGlwKCkgKw0KICB0aGVtZV9taW5pbWFsKCkgKw0KICBnZW9tX3RleHQoc3RhdCA9ICJjb3VudCIgLA0KICAgICAgICAgICAgYWVzKHggPSByZWdpb24gLCBsYWJlbCA9IHN0YXQoY291bnQpKSwgaGp1c3QgPSAgMS41KQ0KYGBgDQoNCiMgNA0KDQpVc2luZyB0aGUgUGxvdGx5IHBhY2thZ2UgaW4gUiwgY3JlYXRlIGFuIGludGVyYWN0aXZlIGNoYXJ0IHRoYXQgc2hvd3MgdGhlIHNjYXR0ZXIgcGxvdCBiZXR3ZWVuIGJtaSAoaW4gdGhlIGhvcml6b250YWwgYXhpcyksIGNoYXJnZXMgKGluIHRoZSB2ZXJ0aWNhbCBheGlzKSwgYW5kIGNvbG9yIHRoZSBkb3RzIGJhc2VkIG9uIHJlZ2lvbnMuDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIG1lc3NhZ2U9VFJVRSwgd2FybmluZz1UUlVFDQpsaWJyYXJ5KHBsb3RseSkNCg0KIHBsb3RfbHkoDQogIGRmLA0KICB5ID0gfiBjaGFyZ2VzLA0KICB4ID0gfiBibWksDQogIA0KICBjb2xvciA9IH4gcmVnaW9uDQogICAgICApDQogDQpgYGANCg==